Large Scale Image Auto-Annotation |
Return |
Introduction
With the explosive growth of social network and image sharing communities, numberless images with some or no tags are being posted on the Web. This tag-incompleteness has posed a great challenge to the keyword-based image retrieval methods and systems. Data-Driven Image Auto-Annotation (DD-IAA) is a promising approach to extracting semantic concepts (i.e., high-level textual descriptions/tags) from images. To automatically predict image tags, DD-IAA learns the latent relationship/mapping between the semantic concept space and the visual feature space by leveraging the Web as an infinite semantic repository and knowledge base, and utilizing a variety of techniques in the field of data mining, machine learning and computer vision, etc. DD-IAA is a newly-emerged research area. It involves many fundamental theories and practical techniques, making the corresponding research meaningful in theory and useful in application. However, existing methods for image auto-annotation have not yet been practical enough to support the tagging of large-scale Web images. To tackle this challenge, in this project, we will make an in-depth study on DD-IAA. Our major research content includes but is not limited to: building and maintaining a Web image knowledge base, tag processing for Web images, candidate tag selection and propagation, etc. With our own developed key techniques and algorithms, we will implement a semantic-aware image retrieval system based on DD-IAA. We aim to make theoretical achievements, develop novel techniques, and lay a solid foundation on both theories and techniques for this field.
Framework
- Collecting web images
- Building/maintaining a web image knowledge base
- Candidate tag selection and propagation for DD-IAA
- Developing demo systems
As illustrated in Fig. 1, the framework of this project consists of four parts, i.e., collecting web images, building/maintaining a web image knowledge base, investigating methods of candidate tag selection and propagation for DD-IAA, and developing demo systems.
Collecting millions or even billions of web images with their associated textual tags or metadata is the foundation of constructing a web image knowledge base. We have developed kinds of crawlers to collect web images from image sharing communities like Flickr and other web image sources like wikipedia.
With web images collected, we focus on investigating feasible methods to build an open, well-organized, extensible and large-scale web image knowledge base. To do so, firstly we need to perform data clean and integration for images collected from multiple heterogeneous data sources. Secondly, as existing tags of collected web images are generally imprecise and incomplete, it is necessary for us to develop effective and efficient tag processing methods for tag denoising, completion, ranking, etc. Moreover, as the knowledge base is expected to be extensible and dynamic, bursty semantic concepts should be well detected and added to the vocabulary for auto-annotation later. Finally, since DD-IAA relies heavily on the quality of the retrieved visual neighbours of a to-be-annotated image, we also make efforts to develop metric learning methods for better measuring image-image similarities, and efficient index methods/structures (e.g., hashing methods) for cutting down the retrieval costs while keeping acceptable performance.
For DD-IAA methods, the annotation performance depends on not only the quality of visual neighbours but also the tag selection and propagation strategy that determines which tags appearing in visual neighbours should be added. Hence, we concentrate on developing novel methods of candidate tag selection and propagation for DD-IAA, considering many remaining challenges like class imbalance, combination of discriminative and generative models, etc.
With the developed algorithms and techniques above, we are designing and developing demo systems that can well present their effectiveness, which will probably be published online later on. We still hope that in a near future these algorithms and techniques can be utilized for tackling more challenging real-world image auto-annotation problems.
Publications:
- Zijia Lin, Guiguang Ding, Mingqing Hu, Yunzhen Lin, Shuzhi Sam Ge: Image Tag Completion via Dual-view Linear Sparse Reconstructions. Computer Vision and Image Understanding (Accepted to appear, 2014)
- Zijia Lin, Guiguang Ding, Mingqing Hu: Image Auto-annotation via Tag-dependent Random Search over Range-constrained Visual Neighbours. Multimedia Tools and Applications (Jan. 2014)
- Zijia Lin, Guiguang Ding, Mingqing Hu, Jianmin Wang, Xiaojun Ye: Image Tag Completion via Image-Specific and Tag-Specific Linear Sparse Reconstructions. CVPR 2013: 1618-1625
- ijia Lin, Guiguang Ding, Mingqing Hu, Jianmin Wang, Jiaguang Sun: Automatic Image Annotation using Tag-Related Random Search over Visual Neighbors. CIKM 2012: 1784-1788
Patents:
- Guiguang Ding, Zijia Lin: Image Tag Completion via Optimal Linear Sparse Reconstructions. Patent in China. CN103218460A (2013.07.24) [丁贵广、林梓佳:基于最优线性稀疏重构的图像标签补全方法,公开号CN103218460A ,公开日期2013.07.24]
- Guiguang Ding, Zijia Lin: Image Auto-annotation via Random Search on Directed Graphs. Patent in China. CN102298605A (2011.12.28) [丁贵广、林梓佳:基于有向图非等概率随机搜索的图像自动标注方法及装置,公开号CN102298605A,公开日期2011.12.28]